Regional Attention Network (RAN) for Head Pose and Fine-Grained Gesture Recognition

نویسندگان

چکیده

Affect is often expressed via non-verbal body language such as actions/gestures, which are vital indicators for human behaviors. Recent studies on recognition of fine-grained actions/gestures in monocular images have mainly focused modeling spatial configuration parts representing pose, human-objects interactions and variations local appearance. The results show that this a brittle approach since it relies accurate parts/objects detection. In work, we argue there exist discriminative semantic regions, whose “informativeness” can be evaluated by the attention mechanism inferring gestures/actions. To end, propose novel end-to-end regional network (RAN) , fully convolutional neural (CNN) to combine multiple contextual regions through mechanism, focusing most relevant given task. Our consist one or more consecutive cells adapted from strategies used computing HOG (Histogram Oriented Gradient) descriptor. model extensively ten datasets belonging 3 different scenarios: 1) head pose recognition, 2) drivers state 3) action facial expression recognition. proposed outperforms state-of-the-art considerable margin metrics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fine-grained pose prediction, normalization, and recognition

Pose variation and subtle differences in appearance are key challenges to finegrained classification. While deep networks have markedly improved general recognition, many approaches to fine-grained recognition rely on anchoring networks to parts for better accuracy. Identifying parts to find correspondence discounts pose variation so that features can be tuned to appearance. To this end previou...

متن کامل

Fine-Grained Head Pose Estimation Without Keypoints

Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is ...

متن کامل

Fine-Grained Activity Recognition with Holistic and Pose Based Features

Holistic methods based on dense trajectories [29, 30] are currently the de facto standard for recognition of human activities in video. Whether holistic representations will sustain or will be superseded by higher level video encoding in terms of body pose and motion is the subject of an ongoing debate [12]. In this paper we aim to clarify the underlying factors responsible for good performance...

متن کامل

Attention for Fine-Grained Categorization

This paper presents experiments extending the work of Ba et al. (2014) on recurrent neural models for attention into less constrained visual environments, beginning with fine-grained categorization on the Stanford Dogs data set. In this work we use an RNN of the same structure but substitute a more powerful visual network and perform large-scale pre-training of the visual network outside of the...

متن کامل

Head Gesture Recognition Based on Bayesian Network

Head gestures such as nodding and shaking are often used as one of human body languages for communication with each other, and their recognition plays an important role in the development of HumanComputer Interaction (HCI). As head gesture is the continuous motion on the sequential time series, the key problems of recognition are to track multi-view head and understand the head pose transformat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Affective Computing

سال: 2023

ISSN: ['1949-3045', '2371-9850']

DOI: https://doi.org/10.1109/taffc.2020.3031841